Analysis of Parallel Structures in Patent Sentences, Focusing on the Head Words
نویسنده
چکیده
One of the characteristics of patent sentences is long, complicated modifications. A modification is identified by the presence of a head word in the modifier. We extracted head words with a high occurrence frequency from about 1 million patent sentences. Based on the results, we constructed a modifier correcting system using these head words. About 60% of the errors could be modified with our system.
منابع مشابه
Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations
The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...
متن کاملIdentification of BKCa channel openers by molecular field alignment and patent data-driven analysis
In this work, we present the first comprehensive molecular field analysis of patent structures on how the chemical structure of drugs impacts the biological binding. This task was formulated as searching for drug structures to reveal shared effects of substitutions across a common scaffold and the chemical features that may be responsible. We used the SureChEMBL patent database, which prov...
متن کاملAcquisition of cleft structures in L1 and L2
The present study aims at exploring the processing difficulty of cleft structures as a type of relative clause for EFL and Persian as first language learners.The impact of head nouns with various functions as well as that of embedding on the processing of Persian and English cleft structures has been investigated in the present study.The participants were 68 Iranian male and female students...
متن کاملبازشناسی متون فارسی با استفاده از مدل زبانی n-gram و پالایش گرامری
Abstract Text recognition has been one of the growing research topics in recent years. Many of these researches have focused on recognition of letters and sub-words as a basis for identifying larger text structures such as words, phrases and sentences. This thesis presents a new method in which the recognized sub-words are combined in order to provide meaningful words and sentences in Farsi tex...
متن کاملA Syntactic Analysis Method of Long Japanese Sentences Based on the Detection of Conjunctive Structures
This paper presents a syntactic analysis method that first detects conjunctive structures in a sentence by checking parallelism of two series of words and then analyzes the dependency structure of the sentence with the help of the information about the conjunctive structures. Analysis of long sentences is one of the most difficult problems in natural language processing. The main reason for thi...
متن کامل